No description has been provided for this image

Open In Colab

Operations on Data Frames¶

Let me get the data on dengue from Peru:

In [1]:
import pandas as pd
linkData="https://github.com/SocialAnalytics-StrategicIntelligence/TableOperations/raw/main/dengue_ok.pkl"

dengue = pd.read_pickle(linkData)

# checking format
dengue.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 501236 entries, 0 to 501235
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   departamento  501236 non-null  object  
 1   provincia     501236 non-null  object  
 2   distrito      501236 non-null  object  
 3   ano           501236 non-null  int64   
 4   semana        501236 non-null  int64   
 5   sexo          501236 non-null  object  
 6   edad          501236 non-null  int64   
 7   enfermedad    501236 non-null  category
 8   case          501236 non-null  int64   
dtypes: category(1), int64(4), object(4)
memory usage: 31.1+ MB
In [2]:
# Each row is a person:
dengue.head()
Out[2]:
departamento provincia distrito ano semana sexo edad enfermedad case
0 HUANUCO LEONCIO PRADO LUYANDO 2000 47 M 9 SIN_SEÑALES 1
1 HUANUCO LEONCIO PRADO LUYANDO 2000 40 F 18 SIN_SEÑALES 1
2 HUANUCO LEONCIO PRADO JOSE CRESPO Y CASTILLO 2000 48 F 32 SIN_SEÑALES 1
3 HUANUCO LEONCIO PRADO JOSE CRESPO Y CASTILLO 2000 37 F 40 SIN_SEÑALES 1
4 HUANUCO LEONCIO PRADO MARIANO DAMASO BERAUN 2000 42 M 16 SIN_SEÑALES 1
In [3]:
# some exploration
dengue.describe().apply(lambda s: s.apply('{0:.5f}'.format))
Out[3]:
ano semana edad case
count 501236.00000 501236.00000 501236.00000 501236.00000
mean 2014.77213 21.99838 28.96143 1.00000
std 6.14646 14.76658 18.15954 0.00000
min 2000.00000 1.00000 0.00000 1.00000
25% 2011.00000 11.00000 15.00000 1.00000
50% 2016.00000 18.00000 26.00000 1.00000
75% 2020.00000 32.00000 41.00000 1.00000
max 2022.00000 53.00000 106.00000 1.00000
In [4]:
# exploring
dengue.enfermedad.value_counts()
Out[4]:
enfermedad
SIN_SEÑALES    443996
ALARMA          54981
GRAVE            2259
Name: count, dtype: int64

Better labels:

In [5]:
dengue['enfermedad_text']=dengue.enfermedad.astype(str)

dengue.replace({'enfermedad_text':{'SIN_SEÑALES':'1_SIN_SEÑALES','ALARMA':'2_ALARMA','GRAVE':'3_GRAVE'}},inplace=True)
In [6]:
# exploring
dengue.ano.value_counts(sort=False)
Out[6]:
ano
2000     5557
2001    23526
2002     8086
2003     3349
2004     9547
2005     5640
2006     4022
2007     6344
2008    12824
2009    13407
2010    16842
2011    28084
2012    28505
2013    13092
2015    35816
2014    17234
2016    25160
2017    68279
2018     4698
2019    15287
2020    47932
2021    44791
2022    63214
Name: count, dtype: int64

Discretizing:

In [7]:
binLimits=[0,15,50,110]
theLabels=["a_menor_a_16","b_entre_16y50","c_mayor_a_50"]
dengue["edad_grupos"]=pd.cut(dengue['edad'], include_lowest=True,
                                     bins=binLimits, 
                                     labels=theLabels,
                                     ordered=True)

# see

dengue.head()
Out[7]:
departamento provincia distrito ano semana sexo edad enfermedad case enfermedad_text edad_grupos
0 HUANUCO LEONCIO PRADO LUYANDO 2000 47 M 9 SIN_SEÑALES 1 1_SIN_SEÑALES a_menor_a_16
1 HUANUCO LEONCIO PRADO LUYANDO 2000 40 F 18 SIN_SEÑALES 1 1_SIN_SEÑALES b_entre_16y50
2 HUANUCO LEONCIO PRADO JOSE CRESPO Y CASTILLO 2000 48 F 32 SIN_SEÑALES 1 1_SIN_SEÑALES b_entre_16y50
3 HUANUCO LEONCIO PRADO JOSE CRESPO Y CASTILLO 2000 37 F 40 SIN_SEÑALES 1 1_SIN_SEÑALES b_entre_16y50
4 HUANUCO LEONCIO PRADO MARIANO DAMASO BERAUN 2000 42 M 16 SIN_SEÑALES 1 1_SIN_SEÑALES b_entre_16y50

The surface:

In [8]:
pd.crosstab( dengue.enfermedad_text,dengue.edad_grupos, dropna=False, normalize='columns')
Out[8]:
edad_grupos a_menor_a_16 b_entre_16y50 c_mayor_a_50
enfermedad_text
1_SIN_SEÑALES 0.876868 0.890247 0.884549
2_ALARMA 0.119010 0.105475 0.109204
3_GRAVE 0.004122 0.004278 0.006248
In [9]:
pd.crosstab(dengue.enfermedad_text,[dengue.sexo,dengue.edad_grupos], dropna=False, normalize='columns')
Out[9]:
sexo F M
edad_grupos a_menor_a_16 b_entre_16y50 c_mayor_a_50 a_menor_a_16 b_entre_16y50 c_mayor_a_50
enfermedad_text
1_SIN_SEÑALES 0.875221 0.884646 0.881328 0.878431 0.897139 0.888295
2_ALARMA 0.120614 0.110622 0.113616 0.117488 0.099142 0.104073
3_GRAVE 0.004165 0.004731 0.005057 0.004081 0.003720 0.007633

Yearly look¶

In [10]:
# for colab
# !pip install altair -U
# # !pip install "vegafusion-jupyter[embed]"
In [11]:
import altair as alt
alt.data_transformers.enable("vegafusion")
Out[11]:
DataTransformerRegistry.enable('vegafusion')
In [12]:
alt_dengue=alt.Chart(dengue)

enc_dengue=alt_dengue.encode(
    x='ano:T',
    y='mean(edad):Q',
    color='enfermedad_text:N',
)

enc_dengue.mark_line() + enc_dengue.mark_errorband()
Out[12]:

More detailed:

In [13]:
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y='median(edad):Q',
    color='enfermedad_text:N',
    tooltip=['median(edad)','ano:T']
).interactive()

enc_dengue.mark_line().facet(
    row='sexo:N',
    column='edad_grupos:N'
) 
Out[13]:
In [14]:
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y=alt.Y('sum(case):Q'),
    color='enfermedad_text:N',
    tooltip=['sum(case):Q','ano:T']
).interactive()
enc_dengue.mark_line().facet(
    row='sexo:N',
    column='edad_grupos:N'
)
Out[14]:

The previous plot may require a logged Y-axis:

In [15]:
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y=alt.Y('sum(case):Q', scale=alt.Scale(type='log')),
    color='enfermedad_text:N',
    tooltip=['sum(case):Q','ano:T']
).interactive()

enc_dengue.mark_line().facet(
    row='sexo:N',
    column='edad_grupos:N'
)
Out[15]:

Let's get the same results in tables:

In [16]:
indexList=['edad_grupos','ano','sexo','enfermedad_text']
aggregator={'edad': ['median']}
LevelByYear_medians=dengue.groupby(indexList,observed=True).agg(aggregator)
LevelByYear_medians
Out[16]:
edad
median
edad_grupos ano sexo enfermedad_text
a_menor_a_16 2000 F 1_SIN_SEÑALES 10.0
M 1_SIN_SEÑALES 10.0
2001 F 1_SIN_SEÑALES 10.0
3_GRAVE 12.0
M 1_SIN_SEÑALES 10.0
... ... ... ... ...
c_mayor_a_50 2022 F 2_ALARMA 60.0
3_GRAVE 75.5
M 1_SIN_SEÑALES 60.0
2_ALARMA 62.0
3_GRAVE 70.0

336 rows × 1 columns

In [17]:
LevelByYear_medians.unstack(['sexo','enfermedad_text'])
Out[17]:
edad
median
sexo F M
enfermedad_text 1_SIN_SEÑALES 2_ALARMA 3_GRAVE 1_SIN_SEÑALES 2_ALARMA 3_GRAVE
edad_grupos ano
a_menor_a_16 2000 10.0 NaN NaN 10.0 NaN NaN
2001 10.0 NaN 12.0 10.0 NaN 11.5
2002 10.0 NaN 11.5 10.0 12.0 12.0
2003 11.0 NaN NaN 11.0 NaN NaN
2004 10.0 NaN 12.0 10.0 NaN 11.0
... ... ... ... ... ... ... ...
c_mayor_a_50 2018 59.0 58.0 60.0 59.0 60.5 57.0
2019 59.0 58.0 64.0 60.0 59.0 60.0
2020 59.0 59.0 67.0 60.0 60.0 63.0
2021 59.0 60.0 63.0 59.0 59.0 65.0
2022 60.0 60.0 75.5 60.0 62.0 70.0

69 rows × 6 columns

Notice the multi-index:

In [18]:
LevelByYear_medians.info()
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 336 entries, ('a_menor_a_16', 2000, 'F', '1_SIN_SEÑALES') to ('c_mayor_a_50', 2022, 'M', '3_GRAVE')
Data columns (total 1 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   (edad, median)  336 non-null    float64
dtypes: float64(1)
memory usage: 4.5+ KB

These are other possibilities, but not better than the lines:

In [19]:
alt_dengue=alt.Chart(dengue)
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y=alt.Y('sum(case):Q', scale=alt.Scale(type='log')),
    column='enfermedad_text:N'
)
enc_dengue.mark_circle() 
Out[19]:
In [20]:
alt_dengue=alt.Chart(dengue)
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y=alt.Y('sum(case):Q', scale=alt.Scale(type='log')),
    column='enfermedad_text:N',
)
enc_dengue.mark_rule() 
Out[20]:
In [21]:
alt_dengue=alt.Chart(dengue)
enc_dengue=alt_dengue.encode(
    x='ano:T',
    y=alt.Y('sum(case):Q', scale=alt.Scale(type='log')),
    column='enfermedad_text:N',
)
enc_dengue.mark_bar() 
Out[21]:

Let's do some aggregation:

In [22]:
indexList=['edad_grupos','ano','sexo','enfermedad_text']
aggregator={'edad': ['median','mean','min','max']}
LevelByYear_statsFull=dengue.groupby(indexList,observed=True).agg(aggregator)
LevelByYear_statsFull
Out[22]:
edad
median mean min max
edad_grupos ano sexo enfermedad_text
a_menor_a_16 2000 F 1_SIN_SEÑALES 10.0 9.595202 0 15
M 1_SIN_SEÑALES 10.0 9.688752 0 15
2001 F 1_SIN_SEÑALES 10.0 9.737955 1 15
3_GRAVE 12.0 11.000000 5 15
M 1_SIN_SEÑALES 10.0 9.635155 1 15
... ... ... ... ... ... ... ...
c_mayor_a_50 2022 F 2_ALARMA 60.0 62.661512 51 96
3_GRAVE 75.5 69.208333 52 92
M 1_SIN_SEÑALES 60.0 62.421776 51 103
2_ALARMA 62.0 64.401766 51 97
3_GRAVE 70.0 70.080000 54 91

336 rows × 4 columns

Now, some reshaping:

In [23]:
LevelByYear_statsFull.stack(future_stack=True)
Out[23]:
edad
edad_grupos ano sexo enfermedad_text
a_menor_a_16 2000 F 1_SIN_SEÑALES median 10.000000
mean 9.595202
min 0.000000
max 15.000000
M 1_SIN_SEÑALES median 10.000000
... ... ... ... ... ...
c_mayor_a_50 2022 M 2_ALARMA max 97.000000
3_GRAVE median 70.000000
mean 70.080000
min 54.000000
max 91.000000

1344 rows × 1 columns

Mining location¶

Let's use departamento and provincia:

In [24]:
indexList=['ano','departamento','provincia','enfermedad_text']
aggregator={'case':['sum']}
ByYearPlace=dengue.groupby(indexList,observed=True).agg(aggregator)
ByYearPlace
Out[24]:
case
sum
ano departamento provincia enfermedad_text
2000 AMAZONAS BAGUA 1_SIN_SEÑALES 280
UTCUBAMBA 1_SIN_SEÑALES 61
CAJAMARCA CUTERVO 1_SIN_SEÑALES 2
JAEN 1_SIN_SEÑALES 16
HUANUCO LEONCIO PRADO 1_SIN_SEÑALES 29
... ... ... ... ...
2022 UCAYALI PADRE ABAD 1_SIN_SEÑALES 430
2_ALARMA 87
3_GRAVE 2
PURUS 1_SIN_SEÑALES 1
2_ALARMA 1

2300 rows × 1 columns

Create a wide shape:

In [25]:
#long to wide
ByYearPlace.unstack()
Out[25]:
case
sum
enfermedad_text 1_SIN_SEÑALES 2_ALARMA 3_GRAVE
ano departamento provincia
2000 AMAZONAS BAGUA 280.0 NaN NaN
UTCUBAMBA 61.0 NaN NaN
CAJAMARCA CUTERVO 2.0 NaN NaN
JAEN 16.0 NaN NaN
HUANUCO LEONCIO PRADO 29.0 NaN NaN
... ... ... ... ... ...
2022 TUMBES ZARUMILLA 95.0 5.0 NaN
UCAYALI ATALAYA 611.0 96.0 2.0
CORONEL PORTILLO 3077.0 512.0 23.0
PADRE ABAD 430.0 87.0 2.0
PURUS 1.0 1.0 NaN

1310 rows × 3 columns

In [26]:
# no missing values
ByYearPlace_wide=ByYearPlace.unstack().fillna(0)
ByYearPlace_wide
Out[26]:
case
sum
enfermedad_text 1_SIN_SEÑALES 2_ALARMA 3_GRAVE
ano departamento provincia
2000 AMAZONAS BAGUA 280.0 0.0 0.0
UTCUBAMBA 61.0 0.0 0.0
CAJAMARCA CUTERVO 2.0 0.0 0.0
JAEN 16.0 0.0 0.0
HUANUCO LEONCIO PRADO 29.0 0.0 0.0
... ... ... ... ... ...
2022 TUMBES ZARUMILLA 95.0 5.0 0.0
UCAYALI ATALAYA 611.0 96.0 2.0
CORONEL PORTILLO 3077.0 512.0 23.0
PADRE ABAD 430.0 87.0 2.0
PURUS 1.0 1.0 0.0

1310 rows × 3 columns

The idea is get the sgare of people in ALARM status. For that we need this:

In [27]:
sumCases=ByYearPlace_wide.sum(axis=1)
sumCases
Out[27]:
ano   departamento  provincia       
2000  AMAZONAS      BAGUA                280.0
                    UTCUBAMBA             61.0
      CAJAMARCA     CUTERVO                2.0
                    JAEN                  16.0
      HUANUCO       LEONCIO PRADO         29.0
                                         ...  
2022  TUMBES        ZARUMILLA            100.0
      UCAYALI       ATALAYA              709.0
                    CORONEL PORTILLO    3612.0
                    PADRE ABAD           519.0
                    PURUS                  2.0
Length: 1310, dtype: float64
In [28]:
# here you are:
shareAlarma=ByYearPlace_wide.loc[:,('case','sum','2_ALARMA')]/sumCases
shareAlarma.name='shareAlarma'
shareAlarma
Out[28]:
ano   departamento  provincia       
2000  AMAZONAS      BAGUA               0.000000
                    UTCUBAMBA           0.000000
      CAJAMARCA     CUTERVO             0.000000
                    JAEN                0.000000
      HUANUCO       LEONCIO PRADO       0.000000
                                          ...   
2022  TUMBES        ZARUMILLA           0.050000
      UCAYALI       ATALAYA             0.135402
                    CORONEL PORTILLO    0.141750
                    PADRE ABAD          0.167630
                    PURUS               0.500000
Name: shareAlarma, Length: 1310, dtype: float64

No multi index:

In [29]:
shareAlarma=shareAlarma.reset_index()
shareAlarma
Out[29]:
ano departamento provincia shareAlarma
0 2000 AMAZONAS BAGUA 0.000000
1 2000 AMAZONAS UTCUBAMBA 0.000000
2 2000 CAJAMARCA CUTERVO 0.000000
3 2000 CAJAMARCA JAEN 0.000000
4 2000 HUANUCO LEONCIO PRADO 0.000000
... ... ... ... ...
1305 2022 TUMBES ZARUMILLA 0.050000
1306 2022 UCAYALI ATALAYA 0.135402
1307 2022 UCAYALI CORONEL PORTILLO 0.141750
1308 2022 UCAYALI PADRE ABAD 0.167630
1309 2022 UCAYALI PURUS 0.500000

1310 rows × 4 columns

Let's find thwe worst province per Region in a year:

In [30]:
where = shareAlarma.groupby(['ano','departamento'])['shareAlarma'].idxmax()
worst_prov_year = shareAlarma.loc[where].reset_index(drop=True)
worst_prov_year
Out[30]:
ano departamento provincia shareAlarma
0 2000 AMAZONAS BAGUA 0.000000
1 2000 CAJAMARCA CUTERVO 0.000000
2 2000 HUANUCO LEONCIO PRADO 0.000000
3 2000 JUNIN CHANCHAMAYO 0.000000
4 2000 LA LIBERTAD TRUJILLO 0.000000
... ... ... ... ...
366 2022 PIURA SECHURA 0.214116
367 2022 PUNO CARABAYA 0.000000
368 2022 SAN MARTIN HUALLAGA 0.461538
369 2022 TUMBES CONTRALMIRANTE VILLAR 0.054545
370 2022 UCAYALI PURUS 0.500000

371 rows × 4 columns

In [31]:
worst_prov_year.shareAlarma.describe()
Out[31]:
count    371.000000
mean       0.162794
std        0.261947
min        0.000000
25%        0.000000
50%        0.024390
75%        0.247699
max        1.000000
Name: shareAlarma, dtype: float64
In [32]:
# amount of worst provinces per region
len(worst_prov_year.provincia.value_counts())
Out[32]:
79
In [33]:
# amount of worst provinces per region - cleaner
len(worst_prov_year[worst_prov_year.shareAlarma>0].provincia.value_counts())
Out[33]:
74

Some filtering:

In [34]:
worst_ProvYear_alarma=worst_prov_year[worst_prov_year.shareAlarma>0].loc[:,['departamento','provincia']]
worst_ProvYear_alarma.reset_index(drop=True,inplace=True)
worst_ProvYear_alarma
Out[34]:
departamento provincia
0 LORETO MAYNAS
1 JUNIN SATIPO
2 LORETO LORETO
3 MADRE DE DIOS MANU
4 PIURA PIURA
... ... ...
198 PASCO OXAPAMPA
199 PIURA SECHURA
200 SAN MARTIN HUALLAGA
201 TUMBES CONTRALMIRANTE VILLAR
202 UCAYALI PURUS

203 rows × 2 columns

In [35]:
indexList=['departamento','provincia']
aggregator={'provincia':['count']}
worst_ProvYear_alarma_Frequency=worst_ProvYear_alarma.groupby(indexList,observed=True).agg(aggregator)
worst_ProvYear_alarma_Frequency
Out[35]:
provincia
count
departamento provincia
AMAZONAS BAGUA 3
CHACHAPOYAS 3
CONDORCANQUI 1
UTCUBAMBA 5
ANCASH CASMA 3
... ... ...
TUMBES ZARUMILLA 4
UCAYALI ATALAYA 2
CORONEL PORTILLO 3
PADRE ABAD 6
PURUS 1

74 rows × 1 columns

The count informs how many years a province was the most affected:

In [36]:
worst_ProvYear_alarma_Frequency.describe()
Out[36]:
provincia
count
count 74.000000
mean 2.743243
std 2.239501
min 1.000000
25% 1.000000
50% 2.000000
75% 3.000000
max 11.000000
In [37]:
# final look
worst_ProvYear_alarma_Frequency.columns=['yearsAffected']
worst_ProvYear_alarma_Frequency=worst_ProvYear_alarma_Frequency[worst_ProvYear_alarma_Frequency.yearsAffected>2]
worst_ProvYear_alarma_Frequency.reset_index(inplace=True)
worst_ProvYear_alarma_Frequency
Out[37]:
departamento provincia yearsAffected
0 AMAZONAS BAGUA 3
1 AMAZONAS CHACHAPOYAS 3
2 AMAZONAS UTCUBAMBA 5
3 ANCASH CASMA 3
4 ANCASH SANTA 4
5 CAJAMARCA JAEN 6
6 CUSCO LA CONVENCION 8
7 HUANUCO HUANUCO 4
8 HUANUCO PUERTO INCA 4
9 JUNIN SATIPO 11
10 LA LIBERTAD TRUJILLO 5
11 LAMBAYEQUE CHICLAYO 3
12 LAMBAYEQUE LAMBAYEQUE 4
13 LIMA LIMA 8
14 LORETO MARISCAL RAMON CASTILLA 4
15 LORETO MAYNAS 4
16 MADRE DE DIOS TAHUAMANU 3
17 MADRE DE DIOS TAMBOPATA 8
18 PASCO OXAPAMPA 11
19 PIURA MORROPON 3
20 SAN MARTIN BELLAVISTA 3
21 SAN MARTIN HUALLAGA 3
22 SAN MARTIN MARISCAL CACERES 3
23 TUMBES TUMBES 7
24 TUMBES ZARUMILLA 4
25 UCAYALI CORONEL PORTILLO 3
26 UCAYALI PADRE ABAD 6

Let's plot:

In [38]:
alt_worstProv=alt.Chart(worst_ProvYear_alarma_Frequency)

enc_worstProv=alt_worstProv.encode(
    y='departamento',
    x='provincia',
    text='yearsAffected:O',
    size='yearsAffected:O'
)

enc_worstProv.mark_text()
Out[38]:

Let's try another info:

In [39]:
indexList=['ano','departamento','enfermedad_text']
aggregator={'case':['sum']}
ByYearDepa=dengue.groupby(indexList,observed=True).agg(aggregator)
ByYearDepa_wide=ByYearDepa.unstack().fillna(0)
ByYearDepaAlarm=ByYearDepa_wide.loc[:,('case','sum','2_ALARMA')]/ByYearDepa_wide.sum(axis=1)
ByYearDepaAlarm.name='alarmShare'

ByYearDepaAlarm=ByYearDepaAlarm.reset_index()
ByYearDepaAlarm
Out[39]:
ano departamento alarmShare
0 2000 AMAZONAS 0.000000
1 2000 CAJAMARCA 0.000000
2 2000 HUANUCO 0.000000
3 2000 JUNIN 0.000000
4 2000 LA LIBERTAD 0.000000
... ... ... ...
366 2022 PIURA 0.117037
367 2022 PUNO 0.000000
368 2022 SAN MARTIN 0.215691
369 2022 TUMBES 0.049793
370 2022 UCAYALI 0.143742

371 rows × 3 columns

In [40]:
ByYearDepaAlarm.describe()
Out[40]:
ano alarmShare
count 371.000000 371.000000
mean 2011.989218 0.077171
std 6.619781 0.121904
min 2000.000000 0.000000
25% 2006.000000 0.000000
50% 2013.000000 0.016651
75% 2018.000000 0.111299
max 2022.000000 1.000000
In [41]:
ByYearDepaAlarm_focus=ByYearDepaAlarm[ByYearDepaAlarm.alarmShare>0]
In [42]:
ByYearDepaAlarm_focus.describe()
Out[42]:
ano alarmShare
count 203.000000 203.000000
mean 2016.596059 0.141037
std 3.698325 0.134789
min 2002.000000 0.000400
25% 2014.000000 0.043466
50% 2017.000000 0.094629
75% 2020.000000 0.217038
max 2022.000000 1.000000
In [43]:
edges=[-1, .10, .25, .5,1]
theLabels=["a.below10%","b.11-25%","c.26-50%","d.above50%"]
ByYearDepaAlarm_focus.loc[:,"alarmLevels"]=pd.cut(ByYearDepaAlarm_focus['alarmShare'],
                                            include_lowest=True,
                                            bins=edges, 
                                            labels=theLabels,
                                            ordered=True)

##
ByYearDepaAlarm_focus.head()
/var/folders/2n/bkfhfqq16r78g3hf7pdj56y40000gn/T/ipykernel_17181/4275939581.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ByYearDepaAlarm_focus.loc[:,"alarmLevels"]=pd.cut(ByYearDepaAlarm_focus['alarmShare'],
Out[43]:
ano departamento alarmShare alarmLevels
33 2002 LORETO 0.000400 a.below10%
143 2010 JUNIN 0.021429 a.below10%
147 2010 LORETO 0.064297 a.below10%
148 2010 MADRE DE DIOS 0.004065 a.below10%
149 2010 PIURA 0.003336 a.below10%
In [44]:
alt_WorstDepa=alt.Chart(ByYearDepaAlarm_focus).encode(x='ano:O',
                                                      y=alt.Y('departamento:N',
                                                              sort=alt.EncodingSortField(field='alarmShare',op='max',order='descending')))
enc1_WorstDepa=alt_WorstDepa.encode(
    color=alt.Color('alarmLevels:O').scale(scheme="lightgreyred", reverse=False)
)

enc1_WorstDepa.mark_rect()
Out[44]:
In [45]:
enc2_WorstDepa=alt_WorstDepa.encode(
    text=alt.Text('alarmShare:Q', format=".1f"),
    opacity=alt.condition('datum.alarmShare >= 0.3', alt.value(1), alt.value(0)))
enc2_WorstDepa.mark_text(fontStyle='bold')
Out[45]:
In [46]:
enc1_WorstDepa.mark_rect() + enc2_WorstDepa.mark_text()
Out[46]:

You can find different color schemes here